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ture in the digital face image; generating a signature 
curve using the salient pixels; and using the signature 
curve and the eye positions to locate a mouth position. 
In a preferred embodiment, a summation of squared dif- 
ference method is used to detect the eye positions. In 
another preferred embodiment, the eyes and mouth po- 
sitions are validated using statistics. 
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(54) IMethiod for detecting eye and mouth positions in a digital Image 



(57) A digital image processing method for locating 
eyes and mouth in a digital face image. The method in- 
cludes the steps of detecting iris colored pixels in the 
digital face image; grouping the iris colored pixels into 
clusters; detecting eye positions using the iris colored 
pixels; identifying salient pixels relating to a facial fea- 



ture in the digital face image; generating a signature 
curve using the salient pixels; and using the signature 
curve and the eye positions to locate a mouth position. 
In a preferred embodiment, a summation of squared dif- 
ference method is used to detect the eye positions, in 
another preferred embodiment, the eyes and mouth po- 
sitions are validated using statistics. 
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Description 

[0001] The present invention relates to digital image 
processing methods, and more particularly to methods 
of detecting human eye and mouth positions. 
[0002] In digital image processing it is often useful to 
find the eye-mouth coordination, that is, to detecVlocate 
an eye and mouth position. This information can be 

• i« +^ rtrteo rif a hi I map ffiCB in 

Ubt^O, c?AcaMi|jic, 1 11 1%-. r-w—— — . — — 

the image. Since human faces may often be distin- 
guished by their features, eye-mouth coordination also 
can be used as a pre-processor for applications such as 
face recognition that is further used in image retrieval. 
[0003] U.S. Patent No. 6,072,892 (Kim) which issued 
June 6, 2000 discloses an eye position detecting appa- 
ratus and method. The disclosed method for detecting 
the position of eyes in a facial image uses a thresholding 
method on an intensity histogram of the image to find 
three peaks in the histogram representing skin, white of 
the eye, and pupii. 

[0004] While this method may have achieved a cer- 
tain degree of success in its particular application, one 
of the problems with this method is that it needs to scan 
the entire image pixel by pixel and position a search win- 
dow at each pixel. As such, it consumes enormous com- 
puting power. Further, it may also produce a high rate 
of false positives because similar histogram patterns oc- 
cur In places other than eye regions. 
[0005] In "Using color and geometric models for ex- 
tracting facial features". Journal of Imaging Science and 
Technology, Vol. 42, No. 6. pp. 554-561 , 1998, Tomoyuki 
Ohtsuki of Sony Corporation proposed a region seg- 
mentation method to find mouth candidates. However, 
a region segmentation, in general, is very sensitive to 
luminance and chromaticity variations, and therefore 
very unstable. 

[0006] Accordingly, a need continues to exist for a 
method of utilizing information embedded in a digital fa- 
cial image to determine human eye-mouth coordination 
in a robust, yet computationally efficient manner. 
[0007] An object of the present invention is to provide 
a digital image processing method for locating eyes and 
mouth in a digital face image. 

[0008] Still another object of the present invention is 
to provide such a method which is effective for automat- 
ically obtaining eye and mouth positions in a frontal face 
image. 

[0009] Yet another object of the present invention is 

to provide such a method which reduces the region of 

the image that must be searched. 

[0010] Another object of the present invention is to 

provide such a method which reduces the computation 

required to locate the eye and mouth. 

[0011] A still further object of the present invention is 

to provide such a method which reduces the incidence 

of false positives. 

[0012] These objects are given only by way of illus- 
trative example. Thus, other desirable objectives and 



advantages inherently achieved by the disclosed inven- 
tion may occur or become apparent to those skilled in 
the art. The invention is defined by the appended claims. 
[001 3] According to one aspect of the invention, there 
5 is provided a digital image processing method for locat- 
ing eyes and mouth in a digital face image. The method 
includes the steps of detecting iris colored pixels in the 
digital face image; grouping the iris colored pixels into 
clusters; detecting eye positions using the iris colored 
10 pixels; identifying salient pixels relating to a facial fea- 
ture in the digital face image; generating a signature 
curve using the salient pixels; and using the signature 
curve and the eye positions to locate a mouth position. 
In a preferred embodiment, a summation of squared dif- 
15 ference method is used to detect the eye positions. In 
another preferred embodiment, the eyes and mouth po- 
sitions are validated using statistics. 
[0014] The present invention provides a method 
which is effective for automatically obtaining eye and 
20 mouth positions in a frontal face image. The method re- 
duces the region of the image that must be searched, 
thereby reducing the computation required to locate eye 
and mouth, and reducing the incidence of false posi- 
tives. 

25 

FIG. 1 shows a schematic diagram of an image 
processing system suitable for use with a method 
in accordance with the present invention; 
FIG. 2 shows a flow diagram illustrating the method 
30 of determining eye-mouth coordination in accord- 
ance with the present invention; 
FIG. 3 is an illustration showing parameters of a hu- 
man face region; 

FIG. 4(a) shows a plot representing iris and noniris 

35 pixel intensity distributions; 

FIG. 4(b) shows a flow diagram illustrating the proc- 
ess of Bayesian iris modeling; 
FIG. 5 shows a flow diagram showing eye position 
estimation steps; 

40 FIG. 6 is an illustration showing iris color pixel clus- 
ters; 

FIG. 7 shows a flow diagram illustrating the sum- 
mation of squared difference used in eye template 

matching; 

45 FIG. 8 is a view of an eye template in searching eye 
patches in an image; 

FIG. 9(a) shows a flow diagram for finding mouth 
position; 

FIG. 9(b) shows a kernel; 
50 FIG. 9(c) is a view of facial salient pixels and their 
projection onto the vertical axis; 
FIG. 9(d) is an illustration of a lower half and upper 
half of a face region; 

FIG. 9(e) shows a plot representative of a signature 
55 cun/e and peak points; and 

FIG. 9(f) shows an illustration of parameters M, E, 
and D. 
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[0015] Figure 1 shows an image processing system 
50 suitable for use with a method in accordance with the 
present invention. System 50 includes a digital Image 
source 100 adapted to provide a color digital still image. 
Examples of digital image source 100 Include a scanner 
or other device for capturing images and converting the 
image for storage in digital form, a digital image capture 
device such as a digital camera, and a digital image stor- 
age device such as a memory card or compact disk drive 
with a CD. 

[0016] The digital image is a facial Image, preferably 
a frontal view, though the image may be angled from a 
frontal view, 

[0017] The digital image from digital image source 
100 is provided to an image processor 102, such as a 
programmable personal computer, or digital image 
processing work station such as a Sun Sparc worksta- 
tion. Image processor 102 processes the digital image 
in accordance with the method of the present invention. 
[0018] As Illustrated in Figure 1 , image processor 102 
may be networked/connected to a CRT display or image 
display 104, an interface device or other data/command 
entry device such as a keyboard 106, and a data/com- 
mand control device such as a mouse 108. Image proc- 
essor 102 may also be networked/connected to a com- 
puter readable storage medium 107. 
[001 9] Image processor 1 02 transmits the processed 
digital images to an output device 109. Output device 
109 can comprise a printer, a long-term Image storage 
device, a connection to another processor, or an image 
telecommunication device connected, for example, to 
the Internet. A printer, in accordance with the present 
invention, can be a silver halide printer, thermal printer, 
Inkjet printer, electrophotographic printer, and the like. 
[0020] In the following description, a preferred em- 
bodiment of the present Invention is described as a 
method. In another preferred embodiment, described 
below, the present invention comprises a computer pro- 
gram for detecting human eyes and mouths In a digital 
image in accordance with the method described. As 
such, in describing the present invention, It should be 
apparent that the computer program of the present in- 
vention can be utilized by any computer system known 
to those skilled in the art, such as the personal computer 
system of the type shown in Figure 1 . Accordingly, many 
other types of computer systems can be used to execute 
the computer program of the present invention. 
[0021] It will be understood that the computer pro- 
gram of the present invention may employ Image ma- 
nipulation algorithms and processes that are known to 
those skilled in the art. As such, the computer program 
embodiment of the present invention may embody con- 
ventional algorithms and processes not specifically 
shown or described herein that are useful for implemen- 
tation. 

[0022] Other aspects of such algorithms and systems, 
and hardware and/or software for producing and other- 
wise processing the images involved or co-operating 



with the computer program of the present invention, are 
not specifically shown or described herein and may be 
selected from such algorithms, systems, hardware, 
components and elements known in the art. 

5 [0023] The computer program for performing the 
method of the present invention may be stored in com- 
puter readable storage medium 107. Medium 107 may 
comprise, for example, a magnetic storage media such 
as a magnetic disk (e.g., a hard drive or a floppy disk) 

10 or magnetic tape; an optical storage media such as an 
optical disc, optical tape, or machine readable bar code; 
a solid state electronic storage device such as random 
access memory (RAM), or read only memory (ROM); or 
any other physical device or medium employed to store 

15 a computer program. The computer program for per- 
forming the method of the present invention may also 
be stored on computer readable storage medium 107 
connected to image processor 102 by means of the In- 
ternet or other communication medium. Those skilled in 

20 the art will readily recognize that the equivalent of such 
a computer program may also be constructed In hard- 
ware. 

[0024] Turning now to Figure 2, the method of the 
present invention will be described in detail. Figure 2 is 

25 a flow diagram illustrating a first embodiment of the 
method In accordance with the present invention of de- 
termining eye-mouth coordination. In the first embodi- 
ment shown in Figure 2, eye-mouth coordinate determi- 
nation comprises several steps. A first step (step 200) 

30 comprises detecting skin color regions (i.e., face re- 
gions) in the digital image. A second step (step 206) 
comprises identifying iris color pixels from the face re- 
gions. A third step (step 208) comprises estimating eye 
positions from the detected iris color pixels of second 

35 step 206. A fourth step (step 21 2) comprises identifying/ 
extracting salient pixels in the face region and forming 
a signature curve with the salient pixels. A fifth step (step 
214) comprises estimating a mouth position based on 
the information gathered in third step 208 and fourth 

40 step 212. 

[0025] A modeling step (step 21 6) comprises forming 
an iris color Bayesian model training wherein second 
step 206 is provided with a look-up table for detecting 
iris color pixels. Modeling step (step 216) is more par- 
45 ticularly described below with regard to Figures 4(a) and 
4(b). Further, modeling step 216 is performed once, 
preferably off-line. 

[0026] First step 200 in skin color region detection 
comprises three steps as illustrated in Figure 2, specif - 

50 icalty, steps 201, 202, and 203. As illustrated in Figure 
2, step 201 is a color histogram equalization step. Color 
histogram equalization step 201 receives images to be 
processed and ensures that the images are in a form 
that will permit skin color detection. Step 201 is em- 

55 ployed since human skin may take on any number of 
colors in an image because of lighting conditions, flash 
settings or other circumstances. As such, it generally dif- 
ficult to automatically detect skin in such images. In color 



3 



EP 1 255 225 A2 



histogram equalization step 201 , a statistical analysis of 
each image is performed. If the statistical analysis sug- 
gests that the image may contain regions of skin that 
have had their appearance modified by lighting condi- 
tions, flash settings or other circumstances, then such 
images are modified so that skin colored regions can be 
detected. The color histogram equalization of the digital 
face image is preferably performed based on a mean 

. t^. . I :^ ^* ♦U*.* M\/-*t*r^\ ilTIAne 
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[0027] After color histogram equalization step 201, 
the image is searched for skin color regions in skin color 
detection step 202. While it is possible to detect skin in 
a digital image in a number of ways, a preferred method 
for detecting skin in a digital image is the method that is 
described in commonly assigned and co-pending patent 
application U.S. Serial No. 09/692,930, incorporated 
herein by reference. In this preferred method, skin color 
pixels are separated from other pixels by defining a 
working color space that contains a range of possible 
skin colors collected from a large, well-balanced popu- 
lation of images. A pixel is then identified as a skin color 
pixel if the pixel has a color that is within the working 
color space. 

[0028] Skin color detection step 202 identifies a re- 
gion of skin color pixels in the image. This region can 
be defined in any number of ways. In one embodiment, 
the skin color region is defined by generating a set of 
pixel locations identifying the pixels in the image having 
skin colors. In another embodiment, a modified image 
is generated that contains only skin color pixels. In yet 
another embodiment, skin color detection step 202 de- 
fines boundaries that confine the skin color region in the 
image. It will be recognized by those skilled in the art 
that more than one skin color region can be identified in 
the image. 

[0029] Face region extraction step 203 examines the 
skin color regions detected by skin color detection step 
202 to locate skin color regions that may be indicative 
of a face. Face region extraction step 203 defines pa- 
rameters that describe the size of the face and the lo- 
cation of the face within the image. 
[0030] Figure 3 more particularly illustrates the rela- 
tionship between geometric parameters used to define 
a face region in the image. As shown in Figure 3, geo- 
metric parameters may include a Face_top 300, 
Face_bottom 302, Facejeft 304, Face_right 306, 
Face_center_row 308, and Face_center_column 310. 
These parameters can be used in subsequent process- 
ing of the image. 

[0031 ] Once face region extraction step 203 has been 
performed, second step 206), i.e., the iris color pixel de- 
tection step, examines the pixels in the face region to 
detect iris color pixels. In the method in accordance with 
the present invention, second step 206 determines 
whether a pixel is an iris by measuring the red intensity 
of the pixel. Red intensity levels are measured since it 
has been observed that that a human iris has a low red 
intensity level as compared to human skin which has a 



relatively high red intensity level. However, the method 
in accordance with the present invention does not use 
a red level thresholding method to determine whether a 
pixel is to be classified as an iris or as a non-iris. 
5 [0032] Rather, the method of the present invention 
classifies a pixel as an iris or a non-iris pixel on the basis 
of a probability analysis. This probability analysis ap- 
plies an iris statistical model. The iris statistical model 
defines the probability that a pixel is an iris pixel based 
10 on the given red intensity level of the pixel. To construct 
the iris statistical model, two conditional probability den- 
sity distribution functions are needed. Figure 4(a) shows 
the two conditional probability density distribution func- 
tions. Iris intensity distribution function P(/| iris) 412 rep- 
15 resents the likelihood that a given iris pixel has a specific 
red intensity. For example, the likelihood that a given iris 
pixel has a red intensity level of 30 is 0.5, the same pixel 
has a red intensity level 255 is 0.0001 . Noniris intensity 
distribution function P(/ | noniris) 414 represents the 
20 likelihood that a given noniris pixel has a specific red 
intensity. For example, the likelihood that a given noniris 
pixel has a red intensity level of 30 is 0.0001 , the same 
pixel has a red intensity level 255 is 0.1 . The maximum 
value of a likelihood is one (e.g., 1). 
25 [0033] The probability analysis can take many forms. 
For example, the probabilities can be combined in var- 
ious ways with a pixel being classified as an iris or not 
on the basis of the relationship between these probabil- 
ities. However, in a preferred embodiment, a mathemat- 
30 leal construct known as a Bayes model is employed to 
combine the probabilities to produce the posterior prob- 
ability that a pixel having a given red intensity belongs 
to an iris. 

[0034] In this preferred embodiment, the Bayes model 
35 is applied as follows: 

. P{liris)P{iris) 

^('"^ I " P( liris) P( iris)+ P( Inoniris) P( noniris) ' 

40 where P(iris | /) is a conditional probability that a given 
pixel intensity belongs to an iris; P(l \irls) is a conditional 
probability that a given iris pixel has a specific intensity 
I (i.e., iris intensity distribution function 412); P(iris) is a 
probability of the occurrence of an iris in the face region; 

45 P{l\ noniris) is a conditional probability that a given non- 
iris pixel has a specific intensity I (i.e., noniris intensity 
distribution function 414); and P(noniris) is a probability 
of the occurrence of a non-iris pixel in the face oval re- 
gion. Using a probability analysis based on the Bayes 

50 model, a pixel is classified as an iris if the conditional 
probability P(iris \ !) that a pixel having a given red in- 
tensity belongs to an iris is greater than a pre-deter- 
mined value, for example, 0.05. 

[0035] In the embodiment described above, only 
55 those pixels in the face region defined by Face_top 300, 
Face^bottom 302, Facejeft 304, and Face_right 306 
are examined. Confining the pixels to be examined to 
those in the face region reduces the number of pixels to 



4 



7 



EP 1 255 225 A2 



8 



be examined and decreases the likelihood that pixels 
that are not irises will be classified as such. It will be 
recognized that shapes other than the one described 
above can be used to model the human face and that 
parameters that are appropriate to such shapes are 
used in subsequent processing of the image. 
[0036] Further, it will be understood that iris pixels can 
be detected from a skin color region in an image without 
first detecting face boundaries or other shaped area. In 
such a case, each pixel of the skin color region is exam- 
ined to detect iris color pixels and parameters defining 
the skin colored region are used later in the eye detec- 
tion process. 

[0037] Figure 4(b) shows a flow diagram illustrating 
the processes used in modeling step 216, that is, iris 
color Bayesian model training of Figure 2, for developing 
the statistical models used to classify the pixels. Mode- 
ling step 216 is performed before the method for detect- 
ing irises is used to detect iris pixels. As is shown in Fig- 
ure 4(b), a large sample of frontal face images are col- 
lected and examined. All iris pixels and non-iris pixels 
in the face region of each image are then manually iden- 
tified (steps 402 and 404). Next, the conditional proba- 
bility that a given iris pixel has a specific red intensity I, 
P(/ 1 iris), is computed and the probability of the occur- 
rence of an iris in the face oval region, P(irls), is com- 
puted (step 406); then the conditional probability that a 
given noniris pixel has a specific red intensity /, P(/ 1 non- 
iris), is computed and finally the probability of the occur- 
rence of a non-iris pixel in the face oval region, P{non' 
iris), is computed (step 408). The computed statistical 
models of iris and non-iris are used in the Bayes formula 
to produce the conditional probability that a pixel with a 
given intensity belongs to an iris, P(iris \ f) (step 410). In 
application, the Bayes model can be used to generate 
a look-up table to be used in second step 206 for iris 
color pixel detection. Second step 206, the iris color pix- 
el detection step, identifies the location of the iris color 
pixels in the image. The result from second step 206 is 
an iris color pixel image in which noniris color pixels are 
set as zeros. 

[0038] The iris color pixel image resulting from second 
step 206 is used in third step 208. Third step 208 is now 
more particularly described with regard to Figures 5 and 
6. 

[0039] Figure 5 shows a flow diagram illustrating third 
step 208, the process of eye position detection using the 
iris color pixels. As is shown in Figure 5, the eye position 
detection process starts with an iris color pixel clustering 
step 500. If iris color pixels are detected, then the iris 
pixels are assigned to a cluster. A cluster is a non-empty 
set of iris color pixels with the property that any pixel 
within the cluster is also within a predefined distance to 
another pixel in the cluster. One example of a predefined 
distance Is one thirtieth of the digital image height. Iris 
color pixel clustering step 500 of Figure 5 groups iris 
color pixels into clusters based upon this definition of a 
cluster. However, it will be understood that pixels may 



be clustered on the basis of other criteria. 
[0040] Under certain circumstances, a cluster of pix> 
els may not be valid. Accordingly, an optional step of 
validating the clusters is shown in Figure 5 as iris color 
5 pixel cluster validation step 501. A cluster may be 
invalid, for example, if it contains too many iris color pix- 
els or because the geometric relationship of the pixels 
in the cluster suggests that the cluster is not indicative 
of an iris. For example, if the ratio is greater than a pre- 

10 determined value, for example two, then the cluster is 
invalid. That is, the height to width ratio of each iris pixel 
cluster is determined, and the iris pixel cluster is invalid 
if the height to width ratio is greater than the pre-deter- 
mined value. A size measure might also be considered. 

IS That is, a size of each iris pixel cluster can be deter- 
mined by counting the number of iris colored pixels with- 
in each iris pixel cluster; and the iris pixel cluster is 
invalid if the size of the size of the iris pixel cluster is 
greater than a predetermined value. Invalid iriscolorpix- 

20 el clusters are removed from further consideration by 
the method of the present invention. Accordingly, for 
ease of discussion, in the portions of the description that 
follow, valid iris color pixel clusters will hereinafter be 
referred to as iris pixel clusters. 

25 [0041 ] After iris color pixel clustering step 500, a cent- 
er for each of the clusters is calculated in cluster center- 
ing step 502. The center of a cluster is determined as 
the center of mass of the cluster. The center position of 
the clusters is calculated with respect to the origin of the 

30 image coordinate system. The origin of the image coor- 
dinate system for a digital image is typically defined at 
the upper left corner of the image boundary. 
[0042] A face division step 504 employs 
Face_center_column 310 to separate the skin color re- 

35 gion into a left-half region and a right-half region. As is 
shown in Figure 6, iris color pixel cluster 602 and cluster 
center 600 of the iris pixel clusters are positioned in ei- 
ther a left-half region 604 or a right-half region 606 sep- 
arated by Face_center_column 310. 

40 [0043] Referring again to Figure 5, to locate eyes in 
the image using the iris pixel clusters, a left-eye position 
search step 506 is conducted in left-half region 604, 
preferably using a method known as the Summation of 
the Squared Difference. A right-eye position search step 

^5 508 is conducted in right-half region 606, preferably 
based on the same Summation of the Squared Differ- 
ence method. 

[0044] Left and right eye position search steps 506 
and 508 and the summation of the squared difference 
50 method are now more particularly described with refer- 
ence to Figures 7 and 8. 

[0045] In general, the summation of the squared dif- 
ference method involves calculating the summation of 
the squared difference of the intensity values of the cor- 
5S responding pixels in an eye template and a patch of the 
image that has the same size as the template. In this 
method, each pixel in the patch of pixels has a corre- 
sponding pixel in the template. The difference between 
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the intensity level of eacli of the corresponding pixels is 
calculated. Each difference is then squared. The sum of 
each of the squared differences for each of the pixels in 
the set is then calculated. This summation of the 
squared differences provides a relative measure of the 
degree of correspondence between each of the pixel 
sets measured and the template. The eye template itself 
is generated by averaging a large number of sample eye 

[0046] For example, as shown in Figure 8, a window ' 
800 Is centered at each cluster center 600 in a respec- 
tive half-region of the image (604, 606). Window 800 has 
a size which covers substantially the entire cluster about 
which it centers. An eye template 806 is a template of 
an average eye, and moves within window 800. 
[0047] As applied in the present invention, summation 
of the squared difference values are calculated for each 
pixel in each window in each half region. These values 
are compared and the pixel having the lowest relative 
summation of the squared difference value is identified 
as an eye location for the half-region. This process is 
performed separately on the clusters of the left and the 
right-half regions of the image in the manner described 
below. 

[0048] It will be noted that while the present invention 
has been described as using the summation of the 
squared difference method to identify the best relative 
match between the average eye template and each of 
the pixels, other methods to determine the degree of rel- 
ative correspondence can be used. In particular, the 
mean-squared error method can be used in place of the 
summation of the squared difference method. 
[0049] Referring now to Figures 7 and 8, left and right 
eye position search steps 506 and 508 are started with 
centering window 800 at each cluster center 810 in a 
respective half region (step 700). The operation of cal- 
culating the summation of the squared differences (step 
702) is then performed, separately, using a patch of pix- 
els centered on each of the pixels in each window 800 
(step 704). The position of the pixel having the lowest 
summation of squared difference value in each window 
800 is recorded (step 706). When this process has been 
completed for every cluster in a half region (step 708), 
the position of the pixel having the lowest summation of 
squared difference value for the half region is recorded 
(step 709). This position is the eye position for the half- 
region. 

[0050] That is, the digital face image is separated into 
right half region 606 and left half region 604, and each 
iris pixel cluster is associated with either the right half 
region or the left half region. Eye template 806 is de- 
fined, and window 800 is centered at the center of each 
iris pixel cluster. An image patch is defined as having a 
size substantially (preferably, exactly) equal to the size 
of the eye template. Then, to locate the right eye position 
in the right half region, the pixel intensity level difference 
is determined between the eye template and the image 
patch, with the image patch being centered at each pixel 



in each window, and the window being centered at each 
cluster in the right half region. Similarly, to locate the left 
eye position in the left half region, the pixel intensity level 
difference is determined between the eye template and 
the image patch, with the image patch being centered 
at each pixel in each window, and the window being cen- 
tered at each cluster in the left half region. 
[0051] It will be appreciated that the summation of the 
squared difference method of steps 506 and 508 of Fig- 
ure 5, can also be performed without the use of race 
region extraction. In such an embodiment, the skin 
colored region can be divided into a left-half region and 
a right-half region. Iris color pixel clusters can then be 
divided into left-half region and right-half region clusters. 
The summation of the squared difference method can 
then be applied. 

[0052] Fourth step 21 2 and fifth step 21 4 are used in 
finding a mouth position, and are more particularly de- 
scribed in Figures 9(a)-9(f). The input image to the step 
of extracting salient pixels and forming a signature curve 
(fourth step 212) is the original color face image. 
[0053] Referring to Figure 9(a), a morphological 
opening operation (step 901 ) is first applied to the image 
to eliminate bright spots such as reflection of eye glass- 
es or spectacles. The opening operation preserves dark 
facial features such as eyes, nose and mouth. To extract 
salient pixels, the input image is then processed in step 
902 with a high boost filter that is a type of high pass 
filter. The high boost filtering process is accomplished 
by convoluting the image with a high boost filter kernel 
950 as shown in Figure 9(b). High boost filter kernel 950 
comprises a parameter H to be selected by the user. 
The action of a high pass filter is to remove flat intensity 
regions and retain places with high activities (i.e., inten- 
sity contrasts, between a dark region and a bright re- 
gion). Examples of high activity places are shown as sa- 
lient pixels 954 in Figure 9(c). 

[0054] Salient pixels 954 are the result of thresholding 
the high boosted image into a binary image. According- 
ly, step 904 comprises thresholding the high boosted im- 
age into a binary image to generate a binary image with 
salient pixels. An example of a binary image obtained 
after thresholding the high boost filtered original image 
is shown in Figure 9 (c) as 960. Non-salient pixels are 
set to zeros. The example value of the parameter, H, of 
high boost filter kernel 950 is chosen as 9. 
[0055] Following step 904 is step 906 comprising pro- 
jecting salient pixels 954 onto a vertical axis to obtain a 
signature curve 956, as illustrated in Figure 9(c). The 
projection is accomplished by counting the number of 
salient pixels 954 in the horizontal direction and then 
assigning the number to corresponding position on the 
vertical axis. It is evidently that there are more salient 
pixels in the eye and mouth regions than in any other 
regions. Therefore, the result of projecting salient pixels 
954 is signature curve 956 with peaks signifying the 
places of mouth and eye regions. 
[0056] Thus, the search of mouth position in the ver- 
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tical direction becomes a search of a peak position on 
signature curve 956. This search is performed in step 
908. However, before the search of step 908 Is conduct- 
ed, binary image 960 Is divided into an upper half region 
962 and a lower half region 964 as shown in Figure 9 
(d). The divider is Face_center_row 308 obtained in face 
region extraction step 203. The search of step 908 is 
then performed in lower half region 964 of binary image 
960 where the mouth resides. 

[0057] The section of signature curve 956 in tower half 
region 964 may need to be smoothed a few times to re- 
move spurs before the search takes place. Figure 9(e) 
shows a smoothed signature curve 966 with a plurality 
of peak points 968. 

[0058] The smoothing operation of signature curve 
956 to generate smoothed signature curve 966 can be 
performed, for example, using a moving average filter 
or a median filter. A peak being at position i is true if the 
following is satisfied: 

S(/-1 )<S(/)<S(/+1) 

where S(x) is the value of smoothed signature curve 966 
at a position x. Typically there is more than one peak 
positions found in lower half region 964 of binary image 
960. The position of the highest peak value is recorded. 
The reason for recording the highest peak value is be- 
cause the mouth region most likely has more salient pix- 
els 954 than other facial features such as a nose. The 
recorded peak position is subsequently used as the 
mouth position in the vertical direction. 
[0059] Referring again to Figure 9(a), the position of 
the mouth in the horizontal direction is determined at 
step 910 wherein the middle position is found between 
the two eyes detected in third step 208. This position is 
also used in step 912 wherein the eye-mouth coordina- 
tion is validated. In step 912, the identified horizontal 
and vertical positions of the mouth are used as a starting 
point to group salient pixels 954 in the neighborhood of 
the identified mouth position into a mouth salient pixel 
cluster. That is, the salient pixels generally surrounding 
the identified mouth position are grouped into a mouth 
salient pixel cluster. Referring to Figure 9(f), a distance 
M between a left and right boundary of the mouth salient 
pixel cluster is determined. A distance E between the 
two eyes positions identified in third step 208 is deter- 
mined. Further, a distance D from an eye level (i.e., an 
imaginary line drawn between the two eye positions) to 
a mouth level (i.e., an imaginary line drawn between the 
left and right boundary of the mouth salient pixel cluster) 
is also determined. 

[0060] A level of confidence of the detected eye- 
mouth coordination can be estimated, for example, us- 
ing a ratio of M to E or E to D That is, determining wheth- 
er the ratio of M to E or E to D is within a predetermined 
range. For example, a high confidence level is estimated 
if the ratio of M to E is in a range of 0.89 to 0.99, or 



alternatively, if the ratio of E to D is in a range of 0.9 to 
1 .1 . These ranges are derived from the statistics found 
in "Arthropometry of the Head and Face" by Leslie G. 
Farkas, incorporated herein by reference. 

5 [0061] The subject matter of the present invention re- 
lates to digital image understanding technology, which 
is understood to mean technology that digitally process- 
es a digital image to recognize and thereby assign use- 
ful meaning to human understandable objects, at- 

10 tributes or conditions and then to utilize the results ob- 
tained in the further processing of the digital image. 
[0062] In this manner, the present invention provides 
an efficient method for detecting normally appearing hu- 
man eyes and mouth in a digital image. 

15 

Claims 

1 . A digital image processing method for locating eyes 
20 and mouth in a digital face image, comprising the 

steps of: 

a) detecting a plurality of iris colored pixels in 
the digital face image; 

25 b) grouping the plurality of iris colored pixels in- 

to iris pixel clusters; 

c) detecting eye positions using the iris pixel 
clusters; 

d) identifying salient pixels relating to a facial 
30 feature in the digital face image; 

e) generating a signature curve using the sali- 
ent pixels; and 

f) using the signature curve and the eye posi- 
tions to locate a mouth position. 

35 

2. The method of Claim 1 , wherein the step of detect- 
ing the plurality of iris colored pixels in the digital 
face image comprises the steps of: 

40 a) performing a color histogram equalization of 

the digital face image based on a mean inten- 
sity analysis of the digital face image; 

b) identifying a plurality of skin color regions; 

c) identifying a face region from the plurality of 
45 skin color regions; and 

d) examining pixels in the face region to detect 
the plurality of iris colored pixels. 

3. The method of Claim 1 , wherein the step of detect- 
50 ing eye positions comprises the steps of: 

a) defining an eye template having a size; 

b) defining an image patch having a size sub- 
stantially equal to the size of the eye template; 

55 c) determining a center of each iris pixel cluster; 

d) defining a window for each iris pixel cluster, 
the window being centered at the center of each 
iris pixel cluster; 
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e) separating the digital face image into a right 
half region and a left half region; 

f) associating each iris pixel cluster with either 
the right half region or the left half region; 

g) locating a right eye position in the right half 
region by, for each iris pixel cluster disposed in 
the right half region, centering the image patch 
on each pixel in the window and 

.riot Arm) ninn 

a oLxal intensitv level difference be- 
tween the eye template and the image patch; 

and 

h) locating a left eye position in the left half re- 
gion by, for each iris pixel cluster disposed in 
the left half region, centering the image patch 
on each pixel in the window and 
determining a pixel intensity level difference be- 
tween the eye template and the image patch. 

The method of Claim 1 , wherein the step of identi- 
fying salient pixels comprises the steps of: 

a) morphologically smoothing the digital face 
image to generate a smoothed face image; 

b) high-boost filtering the smoothed face image 
to generate a filtered face image; and 

c) thresholding the filtered face image into a bi- 
nary image having salient pixels. 

The method of Claim 1 , wherein the step of using 
the signature curve and the eye positions to locate 
a mouth position comprises the steps of: 

a) locating a position midway between the eye 

positions; 

b) defining the midway position as a horizontal 
coordinate of the mouth; 

c) locating a bottom peak position on the sig- 
nature curve; and 

d) defining the bottom peak position as a verti- 
cal coordinate of the mouth, 

A digital image processing method for locating eyes 
and mouth in a digital face image, comprising the 
steps of: 

a) detecting a plurality of iris colored pixels in 
the digital face image; 

b) grouping the plurality of iris colored pixels in- 
to iris pixel clusters; 

c) detecting eye positions using the iris pixel 
clusters; 

d) identifying salient pixels relating to a facial 
feature in the digital face image; 

e) generating a signature curve using the sali- 
ent pixels; 

f) finding peaks of the signature curve; 

g) using the signature curve and the eye posi- 
tions to locate a mouth position; and 



h) validating the eyes and mouth position. 

7. The method of Claim 27, wherein the step of detect- 
ing the plurality of iris colored pixels in the digital 

5 face image comprises the steps of: 

a) performing a color histogram equalization of 
the digital face image based on a mean inten- 
sity analysis of the digital face image; 
10 b) identifying a plurality of skin color regTons; 

c) identifying a face region from the plurality of 
skin color regions; and 

d) examining pixels in the face region to detect 
the plurality of iris colored pixels. 

15 

8. The method of Claim 27, wherein the step of detect- 
ing eye positions comprises the steps of: 

a) determining a center of each iris pixel cluster; 
20 b) defining a window for each iris pixel cluster, 

the window being centered atthe center of each 
iris pixel cluster, the window having a size suf- 
ficient to cover the iris pixel cluster; 

c) separating the digital face image into a right 
25 half region and a left half region; 

d) associating each iris pixel cluster with either 
the right half region or the left half region; 

e) locating a right eye position in the right half 
region by determining a pixel intensity level dif- 

30 ference between an average eye and an image 

patch, the image patch having a size substan- 
tially equal to a size of the average eye, the im- 
age patch being centered at each pixel in the 
window, the window being centered at each iris 
35 pixel cluster in the right half region; and 

f) locating a left eye position in the left half re- 
gion by determining a pixel intensity level dif- 
ference between the average eye and the im- 
age patch, the image patch being centered at 

40 each pixel inthe window, the window being cen- 

tered at each iris pixel cluster in the left half re- 
gion. 

9. The method of Claim 27, wherein the step of iden- 
45 tifying salient pixels comprises the steps of: 

a) morphologically smoothing the digital face 
image to generate a smoothed face image; 

b) high-boost filtering the smoothed face image 
50 to generate a filtered face image; and 

c) thresholding the filtered face image into a bi- 
nary image having salient pixels. 

10. The method of Claim 27, wherein the step of vali- 
55 dating the eyes and mouth position comprises the 

steps of: 

a) grouping the salient pixels surrounding the 
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mouth position to define a mouth salient pixel 
cluster; 

b) calculating a distance M between a left 
boundary and a right boundary of the mouth sa- 
lient pixel cluster; 5 

c) calculating a distance E between the eyes 
positions; 

d) determining a first ratio of M to E; 

e) determining whether the first ratio is within a 
predetermined first range; 

f ) calculating a distance D between an eye level 
position and a mouth level position; 

g) determining a second ratio of E to D; and 

h) determining whether the second ratio is with- 
in a predetermined second range. 
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